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Command and Data Handling (CADH) 


Consultative Committee for Space Data 
Systems (CCSDS) 


Commercial Off The Shelf (COTS) 
Dynamic Random Access Memory (DRAM) 
Error Detection and Correction (EDAC) 


Electrical, Electronic and 
Electromechanical (EEE) 


Geosynchronous Equatorial Orbit (GEO) 
Goddard Space Flight Center (GSFC) 
Integrated Circuits (ICs) 

International Space Station (ISS) 

NASA Jet Propulsion Laboratory (JPL) 
Low Earth Orbit (LEO) 
Military/Aerospace (Mil/Aero) 

Mars Science Lander (MSL) 


NASA Electronic Parts and Packaging 
(NEPP) Program 


printed circuit boards (PCBs) 
physics of failure (PoF) 


Acronym List 


real-time operating system (RTOS) 


Solar Anomalous Magnetospheric Particle 
Explorer (SAMPEX) 


Synchronous Dynamic Random Access 
Memory (SDRAM) 


Small Explorer Data System (SEDS) 
Single Event Effects (SEE) 

single event functional interrupts (SEFIs) 
single event upset (SEU) 

Small Explorer (SMEX) 

surface mount technology (SMT) 

Static Random Access Memory (SRAM) 
Solid State Recorders (SSRs) 

Size, Weight, and Power (SwaP) 
Ultraviolet (UV) 

Virtual Real-Time Executive (VRTX) 
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sal Abstract/Outline 


¢ NASA has along history of using commercial 
grade electronics in space. In this presentation 


e We will provide a brief history of NASA’s trends 
and approaches to commercial grade electronics 
focusing on processing and memory systems. 


— This will include providing summary information on the space 
hazards to electronics as well as NASA mission trade space. 

— Wewill also discuss developing recommendations for risk 
management approaches to Electrical, Electronic and 
Electromechanical (EEE) parts usage in space. 

— Two examples will be provided focusing on a near-earth Polar- 
orbiting spacecraft as well as a mission to Mars. 

— The final portion will discuss emerging trends impacting 
usage. 
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(GS) Sample Space Hazards by Orbit Type 
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Note that this is not a complete space hazard list. 
Other items such as operation in a vacuum, UV exposure, etc... aren’t included. 
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S& Assurance for EEE Parts 


e Assurance Is 
— Knowledge of 
¢ The supply chain and manufacturer of the product, 
¢ The manufacturing process and its controls, and, 
¢ The physics of failure (PoF) related to the technology. 
— Statistical process and inspection via 
¢ Testing, inspection, physical analyses and modeling. 
— Understanding the application and environmental 
conditions for device usage. 
¢ This includes: 
— Radiation, 
— Lifetime, 
— Temperature, 


— Vacuum, etc., as well as, 
— Device application and appropriate derating criteria. 
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Reliability and Availability 


Reliability (Wikipedia) 

— The ability of a system or component to perform its 
required functions under stated conditions fora 
specified period of time. 

Availability (Wikipedia) 

— The degree to which a system, subsystem, or equipment 
is in a specified operable and committable state at the 
start of a mission, when the mission ts called for at an 
unknown, /.e., a random, time. Simply put, availability is 
the proportion of time a system is in a functioning 
condition. This is often described as a mission capable 
rate. 


The bottom line: 


— Does it work as expected for as long as needed and 
when it’s needed! 
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NASA COTS Challenges 


e Unique Space Usage Constraints 
— Environment hazards 
— Servicing (limited options) 
— Wide range of mission lifetimes and orbits 


— System availability (not just reliability) requirements (criticality of function and 
timing) 


Solution Details 
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For a small market (compared to commercial), 
space electronics place big demands on the semiconductor manufacturer. 
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Sy NASA Historically Uses Mil/Aero Grade 


e Prime reason has been the detailed and relevant 
knowledge about the performance and reliability of 
the actual parts to be flown. 


e Mil/Aero uses a standardized set of manufacturer 
qualification tests that provide confidence ina 
device’s reliability for a wide range of space 
conditions. 


— The test levels are set such that they bound the majority of 
environment and lifetime exposures for space missions with the 
exception of extreme environments and, in some cases, radiation 
tolerance. 


— Mil/Aero also allows manufacturers to perform one set of 
qualification tests rather than a tailored set for each specific 
mission environment and lifetime profile. 

— As noted already, other industries such as automotive and medical 
have their own sets of screening and qualification levels. 
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The Move to COTS in Space 


e Up until 1990 timeframe, NASA used COTS mainly in 
cases where no Mil/Aero alternative existed or in 
non-critical applications. 

¢ However, key performance parameters (size, weight, 
and power — SwaP as well as processing system 
performance) began to drive the usage of COTS into 
mainstream applications within the Agency. 


¢ Example: the history of space data recorders 


1960’s-70’s - Magnetic Core Memory 
1970’s-80’s - Magnetic Tape Recorder 


1990’s - Solid State Recorders (SSRs) — Static Random Access 
Memory (SRAM) 


Late 1990’s - SSR - Dynamic Random Access Memory (DRAM) 
Early 2010’s - SSR - FLASH 
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NASA’s Traditional Approach to 
Using COTS Electronics 


¢ The classic approach was to upscreen: 


— Perform a series of tests over extended environmenti/lifetime 
parameters coupled with application usage information to 
determine if a part can meet a mission’s reliability/availability 
constraints. 


— This includes temperature, vacuum, radiation, shock, vibration, 
etc... 

¢ While the confidence in the reliability/availability of this 

approach may be less than electronics designed for the 

harsh space environment, sufficient risk reduction may 
be achieved. 

— Starting around 1990, NASA missions that had multi-year 
operation or significant radiation requirements began coupling 


COTS parts into systems usually with a salient mix of Mil/Aero 
parts and fault tolerant architectures. 
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Example 1: 
Solar Anomalous Magnetospheric Particle 
Explorer (SAMPEX) 


¢ On November 13, 2012, the SAMPEX 
spacecraft reentered the earth’s 
atmosphere.* 


e SAMPEX, the first of NASA’s Small Explorer 
(SMEX) spacecraft, was launched in 1992 
with a three year design lifetime (5 year 
goal). 

e It lasted operationally nearly twenty years 
due to a myriad of testing, electronic parts 
selection, and system architecture, thrilling 
the scientific investigators who were able to 
obtain tremendous new scientific data. = 


e One should note that the entire spacecraft https:/mww.nasa.gov/images/content/700355main_ 
was designed, built, and validated in three sampex_fulll,jpg 
years (1989-1992) by NASA. 


— It’s orbit was a slightly eccentric low earth polar 
orbit. 


*= Karen C. Fox, “NASA's SAMPEX Mission: A Space Weather Warrior,” NASA/GSFC, Nov. 01,2012, 
http://www.nasa.gov/mission_pages/sunearth/news/sampex-deorbit.html 
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SAMPEX’s Command and Data Handling 
(CADH) System - 
The Small Explorer Data System (SEDS) 


SEDS was built upon traditionally competing 
ideas: 
— Increasing spacecraft performance, and, 
— Having a high reliability/availability spacecraft. 
This led, in itself, to two concepts for the CADH: 
— Selection of commercial and new electronics 
technologies, and, 
— Detailed evaluation (technology), qualification, and 
validation planning. 
The SEDS approach became the cornerstone 
philosophy and system design for generations of 
spacecraft that followed. 
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The SEDS Architecture 
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: Development and first use of a SAMPEX 1773 RETRIES SEP 14 - DEC 15 1992 
fiber optic data bus (MIL-STD- ac ae CT... 8 
1773). SK 


— This included selection and testing of the 
optical and electrical components, 
protocol electronics, connectors, 
couplers, and optical fiber. 

— Radiation testing was partnered with U.S. 
Department of Defense (DoD) (Naval 
Research Labs) which has led to 
continued collaboration between our 
organizations. 


e MIL-STD-1773 was also the first 
NASA move away from traditional 


. % RETRY 
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custom parallel bus structures for scrmice 
data/command transfer to serial & RETRIES 
bus structure. Bee Renee 
— This simplified interconnects and was a 
size, weight, and power (SWAP) savings Figure 11 SAMPEX 1773 retries over Mercator 
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projection 

— The underlying electrical protocol, MIL- 
STD-1553, is stillin common use across after K.A. LaBel, et al, “SEDS MIL-STD-1773 Fiber Optic 
the space industry and paved the way for Data Bus: Proton Irradiation Test Results and Spaceflight 
newer generations of databus SEU Data,” IEEE Transactions on Nuclear Science, Vol. 40, 
implementations such as SpaceWire. No. 6, Dec 1993 
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First NASA use of COTS SRAM as 
means of building a SSR. 


— A Hitachi 32k x8 SRAM device was used and 
tested by the Aerospace Corporation for 
radiation tolerance prior to insertion. 


— The Air Force (P87 Mission) had flown this SSR 
design as an experiment previously. 


— In addition, fault tolerance (Hamming Code 
Error Detection and Correction (EDAC)) was 
included to deal with the expected single event 
upset (SEU) radiation hits. 


The SSR was also the first use of 
surface mount technology (SMT) ina 
NASA spacecraft. 


— SMT replaced through-hole mounting of 
devices to printed circuit boards (PCBs), thus 
allowing for two-sided PCB usage and more 
compact (physical) designs. 

— A detailed series of thermal vacuum and 
shock/vibration testing was performed on test 
coupons to determine “safe usage” and rules 
were developed for the SAMPEX products and 
subsequently used by other NASA missions. 


SEDS Technology: SSR 


P87-2 circa 1990 
1st known spaceflight SSR 
Air Force release pic from the P97-2 mission (aka Stacksat) 
http:/www.thespacereview.com/article/2104/1 
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after C.M. Seidleck, et al, “Single Event Effect Flight Data 
Analysis of Multiple NASA Spacecraft and Experiments; 
Implications to Spacecraft Electrical Designs,” IEEE 
Proceedings of the Third European Conference on Radiation 
and its Effects on Components and Systems, 18-22 Sept. 1995 
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S SEDS Technology: COTS 32-bit Processor 


e The first use of a commercial 32-bit processor in a NASA 
spacecraft (INTEL 80386 and its peripheral support ICs). 


e This drove a number of new features into and of itself: 
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Extensive radiation test campaign by GSFC and JPL on the 80386 processor 
family at the part level. This drove initial designs for fault tolerance. 


A seven layer fault tolerant system that included: 
a watchdog processor, 
software task monitors, 
multi-day timeout, and more. 
Key Feature: the fault tolerance was based on dissimilar strings. 
— Aaradiation hardened 80C86RH processor was used as a watchdog for the main processor 
A full system validation test under radiation exposure (i.e., an engineering 
model was taken to a heavy ion test facility along with the full ground system). 
Various chips were exposed sequentially. 
Upsets/anomalies were noted and the system would utilize its fault tolerant features to recover. 
A small number of unrecoverable events were noted and system workarounds were then designed 
in. This was teamwork at its best. 
First use of a commercial real-time operating system (RTOS): Ready Systems’ 
Virtual Real-Time Executive (VRTX) and the “C” programming language. 


Development and use of a deterministic software bus concept. 


First true implementation of the Consultative Committee for Space Data 
Systems (CCSDS) “Blue Book” by NASA. 
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Example 2: Mars Science Lander (MSL) 


« “Curiosity” Rover 


e Landed on Mars in August, 
2012, with planned ~700 day 
mission 

¢ Currently still functioning, 
about 1700 days. 


¢ Critical “7 minutes of terror” 
window during landing 
— No interaction with ground 


— Any problems (such as stochastic 
radiation events) would have to be 
handled automatically. 


— Good example of system where aereR ee 
“second chance” approach could paahmatee a tets 


improve chances for success. 
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Synchronous Dynamic Random Access 
Memory (SDRAM): 
Common Memory Solution 


MSL uses COTS SDRAMs in the Rover Compute 
Element (RCE) 
— Many radiation-related error modes are known in SDRAMs 


MSL was designed to mitigate these error modes 
— Primarily through error detection and correction (EDAC) 
— Note: It is possible to mitigate ALL possible error modes in 
a SDRAM, using the IBM Chipkill™ technology, for example 
¢ Requires more complicated design 
¢ Difficult to fit into spacecraft SwaP 
Similar devices are used in other NASA missions. 
— Including parts from same wafer lot 
In the MSL case, the devices were architecturally 
identical to devices used in the Juno mission 
— Exception: Factory-set configuration options are different 
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Analysis for Juno helps MSL 


e Juno spacecraft currently in orbit 
around Jupiter 
— Launched August 2011 
— Arrived at Jupiter July 2016 


¢ Shortly after launch, Juno 
experienced single event functional 
interrupts (SEFIs) 


¢ The SEFIs did not significantly 
impact Juno mission performance, 
but could they affect MSL? 


e The data collected for Juno indicated 
a potential risk during MSL landing. 

— “Second chance” software approach ee ee 

was able to incorporate this Error maps from data taken for Juno. 

information SEFIs appear as bands and red dots 


= . . an = (not visible) 
Engineers also improved mitigation to a ie Bea pea 
this type of event before MSL launch 


FTN See 
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NASA’s Changing Landscape 


e With NASA’s new era of commercial providers and small 
space missions (i.e. CubeSats, etc...) other approaches are 
being considered to find more cost-effective approaches to 
meeting mission requirements. 


¢ A few of the considerations for this emerging space 
include, but are not limited to: 
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Increased reliance on fault tolerance, architectural approaches, and 
even constellation spacecraft sparing, 

Leverage on the improved defect reliability of high yield COTS, 
automotive, industrial, and medical grades of electronics, 

Use of higher-assembly level testing, 

Reliance on new tools for model-based mission assurance (MBMA), 
circuit simulation and verification, as well as physics of failure (PoF), 
and, 

Improved communication on considerations, lessons learned and 
guidelines. 
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Sy The Modern Approach to EEE Parts 


e The determination of acceptability for device 
usage is a complex trade space. 
— Every engineer will “solve” a problem differently: 
¢ Ex., software versus hardware solutions. 
¢ The following chart illustrates an risk matrix 
approach for EEE parts based on: 
— Environment exposure, 
— Mission lifetime, and, 
— Criticality of implemented function. 
e Notes: 


— “COTS” implies any grade that is not space qualified 
and radiation hardened. 


— Level 1 and 2 refer to traditional space qualified EEE 
parts. 
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Criticality 
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S A Few Details on the “Matrix” 


e When to test: 
“Optional” 

¢ Implies that you might get away without this, but there’s residual risk. 
“Suggested” 

e Implies that it is good idea to do this, and likely some risk if you don’t. 
— “Recommended” 


e Implies that this really should be done or you'll definitely have some 
risk. 


Where just the item is listed (like “full upscreening for COTS”) 


e This should be done to meet the criticality and environmentilifetime 
concerns. 


¢ The higher the level of risk acceptance by a mission, the higher 
the consideration for performing alternate assembly level testing 
versus traditional part level. 


e All fault tolerance must be validated. 


Good mission planning identifies where on the matrix a EEE part lies. 
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Sy Lessons Learned on COTS for Space (1) 


In an ideal world (and given limitations of 

full state space coverage), you’d want to: 

— Test at the device level to provide input for 
fault tolerant design. And, 


— Test at the system level to validate design 
approaches 


¢ Possibly uncover additional fault modes (statistics of 
test coverage). 


Lots of folks are trying to do the 2nd and 
mistakenly calling it qualification when it’s 
really “system validation” (with inherent 
risk)... 
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Sy Lessons Learned on COTS for Space (2) 


e Understanding the criticality of the application is 
the key to performing adequate testing and 
validation for risk management 

— However, even “good” ground testing and designs can 
be surprised due to random/Markov nature of SEEs and 
challenges related to “completeness” nature of ground 
beam testing (coverage of targets and operating states) 

e Improving data sharing between not only NASA 
projects, but the greater aerospace industry leads 
to improved failure mode knowledge 


— Required as input for designers and for efficient 
determination of additional data needed 


— MSL learned from Juno in a critical functionality area 
¢ What might have happened without it? 
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S Summary 


¢ We have provided an overview of NASA COTS 
electronics usage. 


e This has included 

Background material on the challenge for COTS in 
space, 

Two examples of successes with COTS in space, 


A discussion of a recommended assurance approach, 
and, 


A few lessons learned as takeaways. 
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